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1  Research  Objectives  and  Motivation 


The  main  goal  of  the  project  is  to  develop  fundamental  algorithmic  techniques  that  can  be 
applied  to  problems  that  arise  in  the  context  of  high-speed  communication  networks.  The 
explosion  in  the  size  and  complexity  of  networks,  together  with  QoS  requirements  needed  by 
some  of  the  new  services  raises  many  new  challenging  problems. 

Network  topology  design  is  one  of  the  basic  problems  faced  by  companies  that  are  trying 
to  build  a  new  infrastructure  or  to  expand  the  existing  one.  New  tools  are  needed  in  order  to 
decide,  in  a  principled  way,  on  where  to  put  the  “concentrators”,  which  communication  links 
to  purchase/lease,  and  which  communication  equipment  to  buy.  Essentially  all  of  the  problems 
in  this  class  are  NP-hard  and  hence  we  have  to  design  approximation  algorithms.  The  goal  is 
to  design  efficient  algorithms  that  have  provable  performance  guarantees  (i.e.  approximation 
factors).  Here  by  “efficient”  we  do  not  mean  just  “polynomial  time”.  Since  the  size  of  input  is 
usually  very  large,  it  is  important  to  design  algorithms  that  run  in  nearly  linear  time. 

Once  the  network  topology  is  designed,  the  next  step  is  to  decide  where  to  place  servers 
and  caches.  Server  placement  should  take  into  account  the  topology  of  the  network  and  should 
minimize  the  delay  experienced  by  the  customers.  Distributing  (and,  possibly,  replicating) 
applications  and  data  among  the  servers  should  take  into  account  the  load  placed  on  the  servers. 
The  goal  is  to  minimize  the  total  expense  of  purchasing,  installing,  and  maintaining  the  servers. 

Since  the  resources  (e.g.  bandwidth)  in  a  communication  network  are  limited,  it  is  important 
to  divide  them  between  users  in  a  fair  way.  In  case  of  bandwidth,  the  standard  fairness  defini¬ 
tion  is  max-min  fairness.  Roughly  speaking,  bandwidth  allocation  is  considered  fair  if  “poor” 
connections  can  not  increase  their  bandwidth  by  stealing  from  “rich”  connections.  We  showed 
that  this  definition  is  not  applicable  to  the  case  where  it  is  possible  to  choose  different  routes 
to  different  types  of  connections  and  worked  on  several  algorithms  that  achieve  approximately 
fair  routing  and  bandwidth  allocation  in  the  online  setting. 


2  Results 


Network  Topology  Design  A  significant  part  of  the  cost  of  building  a  new  network  infras¬ 
tructure  is  in  purchasing/leasing  the  communication  links  and  the  associated  communication 
equipment.  Due  to  economies  of  scale,  the  cost  structure  is  usually  concave.  For  example, 
installing  a  DS-3  link  is  usually  much  cheaper  than  installing  30  separate  T1  links  and  using  a 
reverse  multiplexer;  installing  an  OC-3  link  is  usually  cheaper  than  purchasing  3  separate  DS-3 
links.  In  general,  the  higher  the  bandwidth  of  the  link,  the  cheaper  is  the  cost  per  megabit  per 
second  when  we  take  into  account  both  the  cost  of  the  link  and  the  cost  of  the  communication 
equipment  at  the  endpoints  of  the  link. 

The  concave  cost  structure  prevents  us  from  applying  classical  flow  techniques  directly. 
This  is  in  contrast  to  the  convex  cost  case,  where  the  problem  can  be  reduced  to  min-cost 
flow  (or.  in  some  cases,  multicommodity  flow)  by  approximating  each  link  with  several  linear- 


2 


Figure  1:  Buy-at-bulk  input. 


cost  capacitated  links.  It  is  easy  to  see  that  even  the  simplest  problems  with  concave  cost 
structure  are  NP-hard.  For  example,  consider  the  simplest  case  where  link  costs  are  identical 
and  independent  of  capacity /bandwidth  (i.e.  fixed-cost  infinite  capacity  links).  The  problem 
in  this  case  is  exactly  the  Steiner  Tree  problem.  In  other  words.  Steiner  Tree  problem  can  be 
stated  as  finding  an  uncapacitated  min-cost  flow  from  several  sources  to  a  sink  where  the  cost 
is  a  step  function,  i.e.  cost  is  zero  for  zero  flow  and  cost  is  equal  1  for  flow  above  0. 

The  buy-at-bulk  network  design  problem  was  defined  in  [SCRS97].  Roughly  speaking  the 
problem  is  as  follows:  Given  a  graph  with  concave  costs  on  the  edges  and  a  list  of  source/sink 
pairs  with  bandwidth  demands,  construct  the  cheapest  network  that  will  support  all  of  the 
demands.  Awerbuch  and  Azar  [AA97],  gave  an  0(log2n)  randomized  approximation  algorithm 
to  the  buy-at-bulk  problem.  Their  algorithm  was  based  on  the  randomized  tree  construction 
of  Bartal  [Bar96].  This  construction  was  later  improved  by  Bartal  to  give  O(lognloglogn) 
bound  [Bar98].  Using  the  new  construction,  it  is  easy  to  prove  that  the  algorithm  in  [AA97] 
gives  a  better  bound  as  well.  We  have  shown  how  to  derandomize  the  latter  construction 
in  [CCG+98],  while  maintaining  the  performance  bound. 

The  main  deficiency  of  the  Awerbuch/ Azar  approach  is  that  it  requires  that  the  cost  func¬ 
tions  on  all  the  edges  are  the  same.  Since  most  networks  are  not  designed  “from  scratch”, 
different  possible  links  may  have  different  cost  structures,  for  example  depending  on  the  already 
available  “dark  fiber”  or  pre-installed  copper.  Moreover,  uniform  cost  requirement  means  that 
we  can  not  solve  problems  where  we  have  to  both  design  an  inexpensive  network  and  to  locate 
the  servers  at  the  same  time.  (Observe  that  a  server  can  be  represented  as  a  special  edge  with 
a  different  cost  function.) 

Together  with  Meyerson  and  Munagala  (PhD  students  supported  by  the  Contract),  we  have 
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Figure  2:  Buy-at-bulk  output. 


considered  the  buy-at-bulk  problem  where  the  costs  of  different  edges  can  be  different.  The 
main  result  is  a  simple  O(logn)  approximation  algorithm  for  the  case  where  there  is  only  one 
source  [MMPOO]. 

In  fact,  we  solved  a  more  general  problem  that  can  be  used  as  a  framework  to  solve  a  large 
class  of  network  design  and  server  placement  problems:  Given  a  graph  where  each  edge  has 
two  different  costs  a{e)  and  £(e)  and  a  list  of  sources  {s*}  with  demands  {d*},  the  goal  is  to 
minimize  the  following  expression: 

a(e)  +  11,  didistdri  si) 
e€T  i 

Intuitively.  a(e)  corresponds  to  the  fixed  cost  of  purchasing  a  communication  link,  and  i{e) 
corresponds  to  the  incremental  cost  of  using  more  bandwidth  along  this  link. 

In  a  large  class  of  problems  that  appear  in  practice  the  fixed  and  incremental  costs  are  not 
arbitrary.  Roughly  speaking,  it  is  usually  not  useful  to  purchase  a  high-capacity  (and  high  fixed- 
cost)  link  unless  we  can  use  at  least  a  constant  fraction  of  the  capacity  of  this  link.  Formally, 
these  problems  fall  under  the  “access  networks”  class. 

Meyerson  and  Munagala  worked  on  the  design  of  access  networks  and,  together  with  Guha, 
showed  an  0(1)  approximation  algorithm  in  [GMMOO].  The  also  extended  the  approach  to 
give  a  constant  approximation  algorithm  to  the  more  general  single-source  buy-at-bulk  prob¬ 
lem  [GMMOlb].  In  [MMPOla]  we  show  how  to  extend  this  approach  to  obtain  the  first  constant 
factor  approximation  to  the  online  version  of  this  problem,  where  the  “customers  appear  online 
in  a  random  order. 

We  have  implemented  the  above  algorithms  in  order  to  evaluate  their  performance.  Figure  3 
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Figure  3:  Access  network  design  -  preliminary  results 


shows  some  of  the  results.  In  this  figure,  n  is  the  number  of  nodes  in  the  graph  and  k  is  the 
number  of  different  “pipes”  along  each  link.  i.e.  A:  —  1  is  the  number  of  breakpoints  in  the 
piece-wise  linear  approximation  to  the  concave  cost.  The  underlying  graph  was  created  by 
taking  locations  of  180  US  cities,  choosing  a  random  subset  of  18  to  50  nodes,  and  building  a 
random  graph  on  these  nodes.  The  lengths  were  taken  as  real  distances  between  the  cities,  and 
we  assumed  that  we  can  purchase  DSO,  DS1.  DS3.  or  OC3  along  each  link.  (For  simplification, 
we  did  not  consider  DSO  for  the  50-node  graph.) 

Note  that  our  approaches  produce  results  within  15%  of  optimum  fractional  solution.  The 
important  issue  is  scalability.  As  we  can  see  from  the  figure,  the  running  time  of  the  Linear 
Programming  solver  (we  used  CPLEX)  increases  dramatically  with  the  size  of  the  graph,  much 
faster  than  the  increase  in  the  running  time  of  our  algorithms. 


Network  Design  and  Multicommodity  Flow.  In  [GOPS97,  GOPS98]  we  describe  im¬ 
plementation  results  of  a  combinatorial  approximation  algorithm  for  the  minimum-cost  multi- 
commodity  flow  problem.  This  problem  involves  simultaneously  shipping  multiple  commodities 
through  a  single  network  so  that  the  total  flow  obeys  arc  capacity  constraints  and  has  mini¬ 
mum  cost.  This  problem  often  arises  when  one  wants  to  design  topology  of  a  communication 
infrastructure  and  to  determine  capacities  on  the  links. 

Multicommodity  flow  problems  can  be  expressed  as  linear  programs,  and  most  theoreti¬ 
cal  and  practical  algorithms  use  linear-programming  algorithms  specialized  for  the  problems' 
structures.  Combinatorial  approximation  algorithms  in  [GK95,  KP95,  PST95]  yield  flows  with 
costs  slightly  larger  than  the  minimum  cost  and  use  capacities  slightly  larger  than  the  given 
capacities.  Theoretically,  the  running  times  of  these  algorithms  are  much  less  than  that  of 
linear-programming-based  algorithms. 

We  combined  and  modified  the  theoretical  ideas  in  these  approximation  algorithms  to  yield 
a  fast,  practical  implementation  solving  the  minimum-cost  multicommodity  flow  problem.  Ex¬ 
perimentally.  the  algorithm  solved  our  problem  instances  (to  1%  accuracy)  two  to  three  orders 
of  magnitude  faster  than  the  linear-programming  package  CPLEX  and  the  linear-programming 
based  multicommodity  flow  program  PPRN  [CN96]. 


Online  multicast.  In  [GHP98]  we  presented  an  online  algorithm  for  construction  of  multicast 
trees.  This  paper  considered  the  model  where,  at  any  instance  in  time,  a  user  (residing  in  one 
of  the  nodes  of  the  given  communication  network)  requests  to  join  an  existing  multicast  group 
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or  to  start  a  new  group.  The  routing  protocol  should  either  reject  the  request  or  accept  it  and 
allocate  the  requested  bandwidth  along  a  path  connecting  the  new  endpoint  with  the  already 
existing  tree  for  this  group.  Without  loss  of  generality  we  can  assume  that  the  protocol  always 
accepts  requests  to  create  a  new  group.  The  goal  is  to  satisfy  as  many  requests  as  possible. 

The  above  publication  presents  the  first  polylog-competitive  algorithm  for  the  general  mul¬ 
ticast  problem.  Our  algorithm  is  randomized  since  it  is  impossible  to  achieve  polylog  compet¬ 
itive  ratio  by  a  deterministic  algorithm.  The  ratio  of  the  number  of  requests  accepted  by  the 
optimum  offline  algorithm  to  the  expected  number  of  requests  accepted  by  our  algorithm  is 
0(log.M(log7i  4-  log  log  M)  logn),  where  M  is  the  number  of  multicast  groups  and  n  is  the 
number  of  nodes  in  the  graph.  If  each  vertex  is  allowed  to  serve  at  most  one  multicast  group, 
the  competitive  ratio  simplifies  to  0(log3  n). 

A  natural  question  to  ask  is  if  it  is  possible  to  make  the  competitive  ratio  independent 
of  .M,  the  number  of  multicast  group.  We  address  this  issue  by  showing  a  lower  bound  of 
fi(log M  logn)  when  M  is  much  larger  than  the  link  capacities.  This  is  the  first  bound  for  this 
problem  that  is  stronger  than  fi(logn). 


Approximation  through  tree  metrics.  Numerous  NP-hard  problems  on  general  weighted 
graphs  become  much  easier  to  approximate  when  we  restrict  our  attention  to  weighted  trees. 
In  [Bar96,  Bar98],  Bartal  gave  a  randomized  polynomial  time  algorithm  to  construct  a  tree  such 
that,  in  expected  sense,  the  distances  between  any  points  on  the  tree  are  close  to  the  distances 
in  the  original  graph. 

Using  his  algorithm,  one  can  design  approximation  algorithms  for  a  large  class  of  optimiza¬ 
tion  problems.  Roughly  speaking,  we  first  build  a  tree  that  approximates  the  distances,  solve 
the  problem  on  this  tree  only  (disregarding  the  rest  of  the  edges),  and  reinterpret  the  resulting 
solution  in  terms  of  the  original  graph. 

Since  Bartal  construction  is  randomized,  all  of  the  algorithms  resulting  from  the  above  ap¬ 
proach  are  randomized.  In  [CCG+98]  we  have  shown  how  to  derandomize  Bartal’s  construction. 
More  precisely,  we  give  an  efficient  polynomial  time  algorithm  that  given  a  finite  n  point  metric 
G,  constructs  O(nlogn)  trees  and  a  probability  distribution  \x  on  them  such  that  the  expected 
stretch  of  any  edge  of  G  in  a  tree  chosen  according  to  p  is  at  most  O(lognloglogn).  Our 
result  establishes  that  finite  metrics  can  be  probabilistically  approximated  by  a  small  number 
of  tree  metrics.  In  other  words,  we  show  that  given  an  arbitrary  n-point  weighted  graph,  we 
can  construct  a  polynomial  number  of  trees  such  that  if  we  pick  one  of  these  trees  at  random, 
the  expected  distance  along  this  tree  is  at  most  a  factor  of  O(lognloglogn)  more  than  the  real 
distance  in  the  original  graph. 

Our  approach  immediately  gives  deterministic  algorithms  for  many  optimization  problem. 
In  particular,  it  gives  a  deterministric  approximation  algorithm  for  the  buy-at-bulk  problem. 
In  this  problem  we  are  given  pairs  of  points  and  capacity  demands  between  them.  We  are 
also  given  a  graph  that  specifies  where  we  can  buy  links  and  how  much  do  we  pay  per  link  of 
certain  capacity.  The  costs  are  assumed  to  be  concave,  i.e.  buying  two  links  of  capacity  c  is  more 
expensive  than  buying  a  single  link  of  capacity  2c.  The  goal  is  to  construct  the  cheapest  network 
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that  satisfies  the  capacity  demands.  The  only  previously  known  approximation  algorithm  for 
this  problem  [AA97]  was  randomized. 


Caching  and  Data  Placement  Web  and  application  caches  are  essential  part  of  the  infras¬ 
tructure.  Proper  use  of  data  replication  and  caching  can  dramatically  improve  the  response 
time  of  the  system  and,  in  general,  the  overall  “user  experience".  Thus,  it  is  important  to 
develop  principled  tools  to  help  choose  where  to  place  the  caches,  which  data  to  place  in  which 
cache,  and  how  to  manage  the  replication  policies. 

Most  theoretical  work  on  web  caching  [CK99,  AAK99,  Ira97]  has  focused  on  online  page 
replacement  policies.  The  goal  is  to  design  a  dynamic  (online)  policy  that  will  be  not  “overflow” 
caches  and  and  the  same  time  will  be  able  to  satisfy  most  user  requests  for  pages  from  nearby 
caches. 

Recent  experimental  observations  [BCF+99,  ZIR099,  TRS97.  CK99]  tend  to  suggest  that 
the  domain  demand  vectors  change  at  a  relatively  course  time  scale  and  hence  it  is  useful 
to  consider  semi-static  replication  policies  that  assume  statistical  knowledge,  of  data  access 
patterns.  More  precisely,  given  access  statistics  for  each  data  block,  the  goal  is  to  distribute 
the  data  among  the  given  set  of  servers  (replicating  some  of  the  data)  in  order  to  minimize 
the  average  response  time,  measured  as  distance  between  the  user  and  the  page  it  is  trying 
to  access.  This  data  distribution/replication  problem  was  considered  in  [KPR99],  who  showed 
how  to  approximate  within  a  constant  factor  the  average  user  distance  to  a  cache  without 
overflowing  it,  under  some  assumptions  on  the  distance  (delay)  metric. 

In  [MMPOlb]  we  proposed  to  consider  another  parameter,  which  we  called  “user  load”. 
More  precisely,  we  assumed  that  each  server  has  two  independent  parameters:  the  total  size  of 
the  pages  that  it  can  hold,  and  the  total  number  of  pages  that  it  can  serve  per  second  (user 
load).  Note  that  the  size  of  the  page  does  not  necessarily  refer  to  the  number  of  bytes.  For 
example,  we  can  use  page  size  to  represent  the  amount  of  update  traffic  that  will  be  caused  by 
placing  this  page  in  the  server. 

Our  algorithm  in  [MMPOlb]  distributes  the  data  among  the  servers  (replicating  some  of 
the  data)  without  exceeding  the  page  size  and  the  load  bound  by  more  than  O(logC)  factor, 
where  C  is  the  number  of  servers.  In  addition,  our  algorithm  makes  sure  that,  in  the  worst 
case,  the  average  users  network  distance  from  his  assigned  cache  is  within  a  factor  of  5  of  the 
optimum.  If  we  are  not  allowed  to  use  replication,  we  can  improve  these  bounds  to  exceed  cache 
size  and  the  user  threshold  by  only  a  factor  of  4  while  obtaining  optimum  average  distance-to- 
cache.  We  also  showed  that  if  we  are  allowed  to  choose  where  to  place  servers,  we  can  obtain  a 
constant  approximation  to  the  average  delay  experienced  by  users  and  total  server  cost  without 
overflowing  the  caches  and  without  exceeding  the  user  bound. 

The  above  mentioned  approaches  apply  only  to  relatively  small  size  files.  New  approaches 
are  needed  in  order  to  cache  streaming  media.  First,  the  size  of  some  streams  (e.g.  video 
streams)  can  be  extremely  large  which  means  that  it  is  impossible  to  fit  entire  streams  in  a 
cache.  Second,  we  are  more  concerned  with  the  maximum  number  of  simultaneous  cache  misses 
rather  than  the  total  number  of  cache  misses,  since  each  cache  miss  results  in  a  flow  of  data 
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from  the  main  server  to  the  cache,  increasing  the  load  on  the  server-cash  route. 

Following  dynamic  caching  framework  (for  a  single  cache  and  a  single  server)  was  considered 
in  [DS96,  HNG+99].  Suppose  that  there  is  a  request  for  some  data  stream  at  time  t\  and  another 
request  for  the  same  data  stream  at  time  to  =  t\  +  A.  (In  the  terminology  of  [HNG+ 99],  A  is 
the  temporal  distance  between  the  two  requests.)  Suppose  also  that  there  is  enough  space  in 
the  cache  to  store  A  time  units  of  the  stream.  Then,  we  can  serve  both  requests  using  only  a 
single  connection  to  the  server  by  caching  the  last  A  time  units  of  the  stream  that  were  seen 
by  the  first  request.  The  second  request  can  always  obtain  from  the  cache  the  current  stream 
data  that  it  needs. 

Andrews  and  Munagala  [AMOO]  (Munagala  is  a  current  PhD  student)  addressed  this  problem 
in  the  case  where  the  requests  arrive  online,  where  the  goal  is  to  minimize  the  maximum 
bandwidth  ever  used  by  the  server.  They  showed  that  unlike  traditional  online  page  replacement 
policies  (where  the  competitive  ratio  depends  on  the  size  of  the  cache  [FKL+91,  ST85,  MS91]), 
it  is  possible  to  obtain  almost  optimal  performance  in  this  model.  They  also  generalized  this 
approach  to  work  for  arbitrarily  many  caches  and  video  clips  in  a  network  setting. 


Routing  and  Fairness  Part  of  the  process  of  network  design  and  management  is  division 
of  limited  resources  between  the  users.  One  such  resource  is  the  available  bandwidth.  When 
dividing  the  bandwidth,  we  have  to  provide  some  fairness  guarantees.  At  the  same  time  we 
would  like  to  optimize  the  total  bandwidth  allocated  to  the  users  (throughput).  It  is  not  hard 
to  show  that  these  two  goals  are  often  contradictory. 

Accepted  definition  of  fairness  is  the  max-min  fairness  [BG92],  Roughly  speaking,  band¬ 
width  allocation  is  considered  fair  according  to  this  definition  if  “poor”  sessions  can  not  in¬ 
crease  their  bandwidth  by  stealing  from  “richer”  sessions  that  share  links  with  them.  Since 
this  definition  assumes  that  the  routes  are  given,  the  standard  approach  consists  of  two  in¬ 
dependent  steps.  The  first  step  computes  the  routes,  and  the  second  step  divides  the  band¬ 
width  along  these  routes  in  a  max-min  fair  way.  In  the  past  we  were  successful  in  developing 
algorithms  that  can  be  used  to  implement  the  first  step  with  provable  performance  guaran¬ 
tees  [AAP93,  AAF+93].  Several  efficient  distributed  algorithms  that  implement  the  second 
step  are  known  as  well  [AM096a.  AM096b.  AS98.  BFCAZ99]. 

Recent  advances  in  routing  technology  allow  us  to  aggregate  many  TCP/IP  connections 
together  (for  example,  by  hashing  on  the  source/destination  addresses)  and  route  each  ag¬ 
gregated  flow  in  a  different  way.  Thus,  we  need  an  alternative  definition  of  fairness  that,  in 
addition  to  allowing  changes  in  bandwidth,  allows  rerouting  of  different  flows  onto  alternative 
paths.  Moreover,  we  have  to  take  into  account  that,  as  long  as  the  final  step  implements  exact 
max-min  fairness,  it  is  impossible  to  approximate  globally  optimum  total  throughput.  (Locally 
maximum  throughput,  i.e.  maximum  throughput  without  rerouting,  can  be  approximated  by 
distributed  algorithms  such  as  in  [BBR97]). 

The  generalized  definition  should  be  equivalent  to  max-min  fairness  in  the  case  where  the 
routes  are  already  known  and  no  rerouting  is  allowed.  Kleinberg  et.al.  [KRT99]  proposed 
the  following  definition  of  fairness  in  a  variable-route  scenario.  Compute  a  bandwidth  vector 
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composed  of  the  bandwidth  allocated  to  flows,  in  increasing  order.  The  ith  coordinate  is  thus 
equal  to  the  bandwidth  of  the  flow  receiving  ith  least  bandwidth.  Also  define  a  prefix  vector 
where  the  ith  coordinate  is  equal  to  the  sum  of  the  bandwidths  assigned  to  the  i  minimum 
flows  (i.e.  the  sum  of  the  first  i  coordinates  from  the  previously  computed  vector).  Note 
that  the  first  coordinate  of  the  prefix  vector  represents  the  bandwidth  given  to  the  ‘‘most 
starved”  flow  while  the  last  coordinate  is  equal  to  the  total  throughput.  An  allocation  with 
lexicographically  maximum  bandwidth  vector  is  considered  globally  fair.  It’s  not  difficult  to 
show  that  the  allocation  with  the  larger  bandwidth  vector  also  has  the  larger  prefix  vector.  If 
routes  are  preassigned,  the  fairest  possible  allocation  of  bandwidths  according  to  this  definition 
is  the  max-min  fair  allocation. 

Using  the  above  definition  of  global  fairness,  we  can  restate  our  goal  as  follows:  We  need  to 
design  a  bandwidth  allocation  policy  which  assigns  routes  and  bandwidths  so  as  to  guarantee 
that  throughput  is  within  a  small  factor  of  optimal  and  each  term  of  the  bandwidth  vector  is 
within  a  small  factor  of  the  globally  fair  bandwidth  vector. 

Unfortunately,  the  above  stated  criterion  does  not  always  lead  to  reasonable  allocation  of 
bandwidth.  There  are  cases  where  one  can  start  with  a  solution  which  achieves  close  to  optimal 
throughput  and  assigns  every  flow  close  to  its  globally  fair  bandwidth,  and  where  the  bandwidth 
assigned  to  many  ‘‘starved”  flows  can  be  increased  by  a  polynomial  factor,  while  reducing  the 
total  throughput  by  less  than  a  constant  factor. 

This  indicates  that  we  need  to  modify  our  goal.  Instead  of  trying  to  approximate  the 
throughput  and  the  globally  fair  bandwidth  vector,  we  would  like  to  simultaneously  approximate 
all  of  the  possible  bandwidth  vectors.  More  precisely,  the  goal  is  to  design  an  algorithm  that 
assigns  routes  and  bandwidths  such  that  its  corresponding  bandwidth  vector  is  coordinate-wise 
at  least  a  I/7  fraction  of  any  possible  bandwidth  vector.  A  7-competitive  bandwidth  allocation 
guarantees,  for  any  6,  that  the  number  of  connections  which  were  assigned  bandwidth  >  6  is 
not  lower  than  the  number  of  connections  which  were  assigned  bandwidth  >  7 b  in  any  legal 
solution.  It’s  fairly  straightforward  to  show  that  this  results  in  a  prefix  vector  that  is  also 
coordinate-wise  at  least  the  same  7  fraction  of  any  possible  prefix  vector.  Thus  a  7-competitive 
bandwidth  allocation  also  achieves  throughput  which  is  within  a  7  factor  of  optimum. 

As  we  have  observed  in  [GMP01],  another  way  to  explain  why  the  above  definition  of  fairness 
is  reasonable  is  to  relate  it  to  the  notion  of  majorization ,  studied  by  Hardy,  Littlewood  and 
Polya  [HLP29,  HLP52].  Let  x  and  y  be  two  valid  resource  assignment  vectors,  where  the 
ith  coordinate  determines  the  amount  of  some  limited  resource  (not  necessarily  bandwidth) 
allocated  to  user  i.  We  say  that  x  is  majorized  by  y  if,  for  every  k<  the  sum  of  k  smallest 
components  of  x  is  not  less  than  the  sum  of  k  smallest  components  of  y.  Intuitively,  if  the 
above  holds,  then  x  is  “not  worse,  or  “at  least  as  fair”  as  y,  since  poor  users  with  respect  to  x 
resource  distribution  get  at  least  as  much  as  the  poor  users  with  respect  to  y  distribution. 

The  above  intuition  can  be  formalized  by  using  the  fact  that  if  x  is  majorized  by  y  then  for 
any  function  /  that  is  symmetric  and  concave  in  its  arguments,  we  have  J^gfo)  > 

Suppose  there  exists  some  utility  function  which  measures  the  fairness  of  an  allocation.  It  is 
easy  to  see  that  under  any  “natural”  definition  of  fairness,  such  function  has  to  be  symmetric 
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in  its  arguments  and  concave.  Thus,  maximizing  fairness  is  equivalent  to  maximizing  some 
concave  function  (or  minimizing  some  convex  function).  Hence,  under  any  “natural”  definition 
of  fairness,  the  best  solution  will  be  the  one  that  is  majorized  by  all  possible  vectors  describing 
the  resource  distribution. 

Unfortunately  a  vector  x  that  is  majorized  by  all  possible  y  vectors  does  not  need  to  exist. 
For  example,  the  set  of  constraints  {xi  +  £2  +  £3  =  6;2zi  +  £2  <  3:xi,X2,xs  >  0}  does  not 
admit  such  a  vector.  A  natural  approach  is  to  relax  the  definition  of  majorization.  We  say 
that  vector  x  represents  an  7-fair  solution  if  7 x  is  majorized  by  all  possible  vectors  representing 
resource  distribution  among  the  users.  Intuitively,  this  means  that  the  k  poorest  users  together 
in  a  globally  7- fair  solution  must  be  at  least  I/7  times  as  rich  as  the  k  poorest  users  in  any 
other  allocation. 

In  [GMP01]  we  described  the  relationship  between  majorization  and  fairness  and  considered 
the  job  assignment  problem  where  each  job  can  be  assigned  to  a  subset  of  machines.  Here,  the 
limited  resource  is  the  computation  power  of  the  machines  and  the  ith  coordinate  of  the  resource 
assignment  vector  corresponds  to  the  proportion  of  machine  assigned  to  job  i.  We  showed  that 
if  the  jobs  arrive  online,  the  standard  greedy  algorithm  is  0(logn)-fair. 

In  [GMP00]  we  presented  an  online  algorithm  that  routes  the  connections  and  assigns  band¬ 
width  to  each  connection  such  that  the  resulting  7  is  0(log2  n  log  (7),  where  n  is  the  number 
of  nodes  in  the  network  and  U  is  the  maximum  number  of  connections  routed  by  an  optimum 
algorithm  on  a  single  link.  Recently,  offline  variant  of  a  closely  related  bandwidth  assignment 
problem  was  considered  in  [KK00],  who  presented  an  0(log  U)  approximation  algorithm  for  the 
case  where  the  routes  are  fixed  and  all  we  need  to  do  is  decide  how  much  bandwidth  to  allocate 
to  each  route. 

Figure  4  compares  a  variant  of  our  algorithm  in  [GMP00]  (marked  GMP)  to  algorithm  which 
routes  over  random  min-hop  paths  (marked  Random)  and  to  algorithm  based  on  our  earlier 
work  [AAF+93]  that  routes  along  shortest  paths  with  respect  to  cost  that  is  exponential  in  the 
load(marked  EXP).  Experimental  results  show  that  the  latter  algorithm  achieves  good  global 
fairness.  The  graph  shows  scaled  bandwidth  vector,  i.e.  we  divide  the  2-th  coordinate  (total 
bandwidth  assigned  to  the  poorest  2  users)  by  i.  Note  that  the  leftmost  point  corresponds  to 
the  minimum  bandwidth  assigned  to  a  connection,  while  the  rightmost  point  corresponds  to 
average  bandwidth  per  connection,  i.e.  total  throughput  divided  by  the  number  of  connections. 
Observe  that  our  total  throughput  is  significantly  more  than  the  throughput  achieved  by  the 
other  algorithms  while  our  minimum  assigned  bandwidth  does  not  suffer  too  much. 


Online  routing  of  FTP  requests.  In  [GHPT99]  we  developed  an  algorithm  for  routing 
and  scheduling  large  file  transfers  (e.g.  images)  in  a  general  topology  communication  network. 
The  requests  (to  transfer  a  file)  arrive  online  and  the  goal  is  to  eventually  satisfy  all  the  requests. 
Since  the  bandwidth  of  the  links  in  the  network  is  limited,  it  makes  sense  to  try  to  schedule  the 
transmissions  in  a  way  that  utilizes  the  available  resources  optimally. 

We  considered  the  online  ftp  problem ,  which  is  a  formal  abstraction  of  the  above  file  transfer 
problem.  We  assume  that  each  ftp  request  specifies  source/destination  nodes  and  the  size  of 
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Figure  4:  Comparing  optimum  fairness  to  combined  fairness/throughput  algorithm 

the  file.  The  goal  of  the  online  algorithm  is  to  choose  a  path  that  will  be  used  for  transmitting 
each  file,  and  to  decide  on  the  transmission  rate.  The  main  difference  between  this  model  and 
the  (well-studied)  models  for  online  routing  and  admission  control  [GG92,  AAF+93,  GGK+93, 
AAP93]  is  that  here  we  do  not  assume  that  the  sources  have  prespecified  transmission  rate 
requirements,  i.e.  we  can  deal  with  non-streaming  types  of  information. 

Let  n  be  the  number  of  requests  and  P  the  maximum  ratio  between  the  sizes  of  the  files. 
Assume  that  the  smallest  request  can  be  processed  in  one  time  unit.  Let  FlfAX  denote  the 
optimum  max-flow  i.e.  the  smallest  value  for  the  maximum  time  a  request  spends  in  the  system. 
Our  main  results  are  algorithms  that  achieve  the  optimum  max-stretch  and  the  optimum  total 
flow  time  using  resource  augmentation.  For  the  max-stretch  algorithm  we  need  to  increase 
capacities  by  a  factor  of  O(logP),  whereas  the  total  flow  time  algorithm  needs  a  factor  of 
0(logFliAX)  increased  capacity.  The  latter  algorithm  does  not  only  achieve  the  optimum  total 
flow  time,  but  simultaneously  optimizes  many  other  objective  functions,  like  the  maximum  flow 
time,  the  total  square-of-flow-time,  etc. 

To  justify  the  need  for  giving  larger  capacities  to  the  online  algorithm  (i.e.  resource  aug¬ 
mentation),  we  showed  polynomial  lower  bounds  on  both  max-stretch  and  total  flow  time  for 
the  case  where  both  online  and  offline  algorithms  are  using  the  same  capacities. 


Facility  Location  and  A>median  Facility  location  is  a  classical  optimization  problem  where 
we  need  to  “buy”  facilities  in  order  to  minimize  total  cost  of  the  facilities  plus  the  sum  over 
all  customers  of  the  customer-to-facility  distances.  A;-median  is  a  variant  of  this  problem  where 
we  are  required  to  buy  exactly  k  facilities.  Publication  [AGK+01]  describes  a  local-search 
approach  for  approximately  solving  the  /c-median  problem.  The  algorithm  is  quite  practical 
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and  the  resulting  approximation  factor  is  the  best  currently  known.  Fault-tolerant  version  of 
the  facility  location  problem  is  considered  in  [GMMOla].  Online  version  of  the  facility  location 
problem,  where  the  customers  that  need  to  be  served  appear  online,  is  considered  in  [Mey]. 
Profit-earning  variant,  where  each  facility  starts  “bringing  profit”  if  it  serves  enough  customers, 
is  considered  in  [MeyOl]. 
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